A New Approach for Automatic Chinese Spelling Correction

نویسنده

  • Chao-Huang Chang
چکیده

This article presents a new approach for automatic Chinese spelling error detection and correction. Existing Chinese spelling checking systems have two problems: (1) low precision rate, and (2) lack of correction capability. The proposed Chinese spelling correction method is composed of two mechanisms (1) composite confusing character substitution, and (2) advanced word class bigram language model. The characters in the input sentence are rst substituted by their corresponding composite confusing character sets one by one. A composite confusing set is the collection of similar characters to a Chinese character from multiple views of shape, pronunciation, meaning , and input keystroke sequence. The substitution step produces several sentence hypotheses for the input sentence. Then, an advanced word class bigram language model, such as inter-word character bigram (IWCB) or SA-class bigram can be used for scoring each sentence hypothesis. Finally, the best scored sentence hypothesis is compared with the input sentence to determine the typos and their corrections. Experiments show that the proposed approach is very eeec-tive for dealing with the two mentioned problems.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Unified Approach to Transliteration-based Text Input with Online Spelling Correction

This paper presents an integrated, end-to-end approach to online spelling correction for text input. Online spelling correction refers to the spelling correction as you type, as opposed to post-editing. The online scenario is particularly important for languages that routinely use transliteration-based text input methods, such as Chinese and Japanese, because the desired target characters canno...

متن کامل

A New Statistical Approach To Chinese Pinyin Input

Chinese input is one of the key challenges for Chinese PC users. This paper proposes a statistical approach to Pinyin-based Chinese input. This approach uses a trigram-based language model and a statistically based segmentation. Also, to deal with real input, it also includes a typing model which enables spelling correction in sentence-based Pinyin input, and a spelling model for English which ...

متن کامل

Automatic Rule Acquisition for Spelling Correction

This paper describes a new approach to automatically learning linguistic knowledge for spelling correction. A major feature of this approach is the fact that the acquired knowledge is captured in a small set of easily understood rules, as opposed to a large set of opaque features and weights. A perspicuous representation is advantageous in order to best exploit human intuition to understand and...

متن کامل

Extended HMM and Ranking Models for Chinese Spelling Correction

Spelling correction has been studied for many decades, which can be classified into two categories: (1) regular text spelling correction, (2) query spelling correction. Although the two tasks share many common techniques, they have different concerns. This paper presents our work on the CLP-2014 bake-off. The task focuses on spelling checking on foreigner Chinese essays. Compared to online sear...

متن کامل

ارائه یک رتبه‌بند برای خطایاب معنایی با استفاده از ویژگی‌های حساس به متن

Nowadays, a large volume of documents is generated daily. These documents generated by different persons, thus, the documents contain spelling errors. These spelling errors cause quality of the documents are decrease. Therefore, existence of automatic writing assistance tools such as spell checker/corrector can help to improve their quality. Context-sensitive are misspelled words that have been...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007